LASSO-derived model for the prediction of lean-non-alcoholic fatty liver disease in examinees attending a routine health check-up

Abstract Background Lean individuals with non-alcohol fatty liver disease (NAFLD) often have normal body size but abnormal visceral fat. Therefore, an alternative to body mass index should be considered for prediction of lean-NAFLD. This study aimed to use representative visceral fat links with other laboratory parameters using the least absolute shrinkage and selection operator (LASSO) method to construct a predictive model for lean-NAFLD. Methods This retrospective cross-sectional analysis enrolled 2325 subjects with BMI < 24 kg/m2 from medical records of 51,271 examinees who underwent a routine health check-up. They were randomly divided into training and validation cohorts at a ratio of 1:1. The LASSO-derived prediction model used LASSO regression to select 23 clinical and laboratory factors. The discrimination and calibration abilities were evaluated using the Hosmer–Lemeshow test and calibration curves. The performance of the LASSO model was compared with the fatty liver index (FLI) model. Results The LASSO-derived model included four variables—visceral fat, triglyceride levels, HDL-C-C levels, and waist hip ratio—and demonstrated superior performance in predicting lean-NAFLD with high discriminatory ability (AUC, 0.8416; 95% CI: 0.811–0.872) that was comparable with the FLI model. Using a cut-off of 0.1484, moderate sensitivity (75.69%) and specificity (79.86%), as well as high negative predictive value (95.9%), were achieved in the LASSO model. In addition, with normal WC subgroup analysis, the LASSO model exhibits a trend of higher accuracy compared to FLI (cut-off 15.45). Conclusions We developed a LASSO-derived predictive model with the potential for use as an alternative tool for predicting lean-NAFLD in clinical settings.


Introduction
Non-alcoholic fatty liver disease (NaFlD) is a prevalent liver disease worldwide with a global prevalence of 25.24% and an estimated prevalence of 27.4% in asia according to one meta-analysis [1].With the increasing number of people with obesity and an aging global population, the prevalence of NaFlD is expected to rise.however, the disease is not solely caused by increased body weight (BW), as the prevalence of NaFlD is higher in younger populations, even those with a normal or low weight.according to a review, the worldwide prevalence of NaFlD is 10-20% in caucasians and 11-53% in asian populations.shi et al. reported a gradual increase in the prevalence of lean-NaFlD from 5.6% to 12.6% after 2000 [2][3][4].
NaFlD is a serious liver disease associated with extrahepatic conditions, and advanced fibrosis resulting from NaFlD can lead to liver-related mortality and hepatocellular carcinoma [5].although lean-NaFlD patients are initially considered to have a less severe form of the disease, recent studies have demonstrated a similar long-term prognosis in lean-NaFlD patients and those with obesity [6][7][8].a large retrospective cohort study also indicated that a lean status can be associated with liver-related adverse events and overall mortality [9].
the fatty liver index (Fli), which considers body mass index (BMi), waist circumference (Wc), triglyceride levels, and gamma-glutamyl transferase (GGt), has been widely used to predict NaFlD [10][11][12][13] and has moderate-to-high predictive power for fatty liver in lean populations [14].although BMi is frequently used to categorize obesity, it is not a sufficient indicator of central obesity.Moreover, considering that lean individuals often have normal body size, but may have abnormalities in visceral fat, BMi cannot be used as the sole indicator to assess the severity and prognosis of NaFlD.One review article suggested visceral adiposity may be a critical risk factor for lean-NaFlD [15].Furthermore, previous studies have shown that compared to BMi and Wc, waist-hip ratio (WhR) is a superior predictor of visceral fat, which is strongly linked to fatty liver [16,17].Zheng et al. [18] also reported a strong association between WhR and NaFlD, with WhR being an indicator of hepatic steatosis risk, even in adolescents [19,20].therefore, measuring and monitoring visceral adiposity may provide a better understanding of the relationship between body composition and health outcomes.
this study aimed to use representative visceral fat links with other laboratory parameters and the least absolute shrinkage and selection operator (lassO) method to construct a predictive model for lean-NaFlD that has better clinical index significance than the Fli for the lean population.

Subjects
this retrospective cross-sectional study was conducted using the de-identified medical records of 51,271 examinees who underwent routine health checkups at the health examination center of Kaohsiung Veterans General hospital between January 1, 2016, and December 31, 2020.the flow chart of the study is shown in Figure 1.We excluded patients whose records indicated at least one of the following: (1) significant consumption of alcohol, defined as 0.20 g/d for men Figure 1.flow chart of the study.BMi: body mass index and 0.10 g/d for women, according to the National health and Nutrition examination survey iii criteria [21]; (2) liver cirrhosis (defined by ultrasonographic criteria); (3) chronic hepatitis B or c (defined by history, serum hepatitis B surface antigen, and anti-hepatitis c antibody); (4) liver cancer; and (5) lack of ultrasonographic examination in the health checkup data, repeat measurements, or incomplete data.lastly, 2325 examinees with a BMi <24 kg/m 2 were enrolled in the lean patient population.NaFlD was defined as an ultrasonographic diagnosis of a fatty liver.this study was approved by the institutional Review Board of the Kaohsiung Veterans General hospital (no.21-ct2-08(210125-1)). Written consent from the study patients was not necessary as the dataset consisted of de-identified data for research purposes.

Measurements
the weight, height, and body fat mass of all examinees were measured using an electric impedance method analyzer (XscaN PlUs ii; Jawon Medical, Gyeongsan-si, south Korea), with the patients wearing minimal clothes and no socks.BMi was calculated as weight (kg) divided by height (m) squared.Well-trained examiners used a non-stretchable tape measure without exerting pressure on the body surface to measure the Wc of all examinees at the umbilical level.a trained examiner measured all anthropometric indices.abdominal ultrasonographic examinations to determine hepatic fat infiltration were performed by the same five experienced ultrasonographic technicians using a Ge lOGiQ e9 ultrasound machine (Ge healthcare, chalfont st.Giles, United Kingdom).the measurements were verified by five experienced senior radiologists, each with 10 years of experience.the criteria for the diagnosis and severity of fatty liver on ultrasonography were established according to the practice guidelines of the american Gastroenterology association.
the overall dataset (n = 2325) was randomly divided into two groups: the training dataset (n = 1162) and validation cohort (n = 1163).

LASSO-derived prediction model
We analyzed 23 features, including age, sex, BMi, Wc, BW, body fat, WhR, history of hypertension (htN) and diabetes mellitus (DM), fasting glucose, exercise frequency, alcohol drinking frequency, platelet count, alkaline phosphatase, hba1c, cholesterol, hDl-c, lDl-c, triglyceride levels, uric acid, muscle mass, visceral fat, and total cholesterol/hDl-c ratio.lassO regression was used to construct a new prediction model with an optimal lambda value that minimized the cross-validation error and to compare its prediction accuracy and discriminatory ability with that of the Fli model.Finally, we extracted 4 features (visceral fat, triglyceride, hDl-c, and waist and hip ratio) through lassO regression to construct the new prediction model with the optimal value of lambda that minimizes the cross-validation error and compares its prediction accuracy and discriminatory ability with Fli prediction model.

FLI model
Bedogni et al. [22] originally developed the Fli which could accurately distinguish patients with NaFlD.Yang et al. [23] also suggested that the Fli was a reliable non-invasive predictor of NaFlD in both asian and Western populations.
the Fli was calculated using the following formula:

Statistical analysis
student's t-tests were used to compare the continuous variables among the clinical and demographic characteristics of the subjects in the training and validation groups.chi-squared and Fisher's exact tests were used for categorical variables.the primary study outcome was the development of a lassO-derived prediction model (optimal lambda selection) for lean-NaFlD in the asian population.Multiple logistic regression models were applied to estimate the odds ratios (OR) and 95% cis.We evaluated and compared the discriminatory ability of the predictive models using the c-statistic (aUc), akaike information criterion (aic), and Bayesian information criterion (Bic).Models with higher c-statistics and lower aic/Bic values were regarded as having a higher discriminatory ability.c-statistic values ranged from 0.5 (no ability to discriminate) to 1.0 (full ability to discriminate). the hosmer-lemeshow goodness-of-fit statistic was used for calibration.
a model was established according to the lassOderived parameters in the training cohort.statistical significance for all tests was set at p < .05. all statistical analyses were performed using sPss for Windows (version 22.0; sPss inc., chicago, il, Usa) and stata version 13.0 (stata corp, college station, tX, Usa).
the statistical approach employed a combination of the lassO algorithm and Principal component analysis (Pca) to identify key features associated with the main study outcome.emphasis was placed on identifying principal components that collectively accounted for 80% of all features.Pca played a pivotal role in addressing multicollinearity by transforming highly correlated variables into a set of independent variables.this process of feature selection and dimensionality reduction was visually presented using a heatmap (refer to Figure 2).to further enhance the analysis, the lassO algorithm was subsequently utilized to determine the optimal parameters for model development and the establishment of a nomogram.in summary, both the lassO algorithm and Pca were instrumental in identifying crucial features related to lean-NaFlD.Principal components, contributing to 80% of overall features, were selected, and Pca was additionally employed to further refine for training the predictive model, preventing overfitting due to redundant features.Ultimately, four parameters were chosen to construct the predictive models.

Study population characteristics
Of the 2325 subjects in our study, 852 were men. the prevalence of lean-NaFlD in the study cohort was 12.9% (301 among 2325 subjects).the subjects were randomly assigned to the training or validation groups at a ratio of 1:1. in total, 1162 and 1163 subjects were included in the training and validation groups, respectively.table 1   results indicate that the values for all variables were significantly different between the two groups, except for exercise frequency and alcohol drinking frequency.

LASSO-derived predictor for lean-NAFLD
in order to prevent overfitting, we utilized lassO regression for parameter selection during model construction.this also addressed the problem of multicollinearity and excessive feature variables and identified the smallest sub-setting with the strongest interpretation effect and the most consistent variables.

LASSO-derived model for nomogram development
the probability of lean-NaFlD in the study training cohort, according to the multivariable logistic regression model, included 4 potential predictive factors: visceral fat, triglyceride levels, hDl-c levels, and WhR. each significant variable was assigned a score based on a point scale.a straight line was drawn to estimate the probability of lean-NaFlD at each time point by summing the total scores and locating them on a total point scale.For example, if the patient's visceral fat was 1.45, it was first located on the relevant axis.Next, a straight line was drawn downward to the point axis (5th row, named 'score') to obtain the points based on visceral fat (2 points).then, we repeated this course for triglyceride levels, hDl-c levels, and WhR. after that, we summed up all the points to obtain the 'total score' (the bottom row).Finally, a straight line was drawn upward from the 6th row to determine the probability of developing lean-NaFlD.that is, if a patient had visceral fat of 1.45 (2 points), triglyceride level of 220 (2 points), hDl-c level of 33 (5 points), and WhR of 0.96 (3.5 points), there was a 60% probability of lean-NaFlD (Figure 3).Given that individuals with normal BMi tend to have normal Wc, we sought to compare the predictive performance of the lassO model and the Fli for lean-NaFlD specifically in the context of normal Wc.We constructed an additional table aimed to assess the efficacy of these models in predicting fatty liver when Wc is within the normal range (males with Wc < 90 cm and females with Wc < 80 cm), particularly in populations where a significant proportion exhibit normal waist circumference despite having fatty liver.the cut-off value for Fli was 15.45, and 0.1484 for the lassO model.Fli showed an accuracy of 0.790, with a sensitivity of 0.691 and a specificity of 0.803.the lassO model showed a slightly higher accuracy of 0.797, with a higher sensitivity of 0.722, but a similar specificity of 0.807.Overall, both models demonstrated moderate accuracy in predicting lean-NaFlD.Based on the table provided, the lassO model had a slightly higher accuracy than the Fli model in predicting fatty liver (supplementary table 1).

Discussion
herein, we developed a lassO-derived model for predicting lean-NaFlD that included four common risk factors.although several studies [24][25][26] have previously evaluated the risk factors for predicting lean-NaFlD, to our knowledge, this is the first study to use a lassO-derived model for lean-NaFlD prediction. in addition, we used electric impedance method analyzers to evaluate body composition and fat distribution, which provided a more thorough analysis of both total body fat and its distribution, enabling individuals to gain deeper insights into their overall health and make more knowledgeable choices regarding their diet and exercise habits.the major findings were as follows: first, high visceral fat, high triglyceride levels, low hDl-c levels, and high WhR were significantly correlated with lean-NaFlD; second, the four risk factors could be applied to form a suitable and useful nomogram to predict the probability of lean-NaFlD.   in most cases, patients with NaFlD are diagnosed incidentally, either during other medical imaging evaluations or routine annual physical checkups.however, body fat, height, and weight are frequently assessed in clinical settings.Fli is a formula that calculates the probability of NaFlD based on BMi, Wc, GGt, and triglyceride levels, and it aids in the diagnosis of NaFlD without the need for invasive procedures such as liver biopsy [11].to enhance the predictive capacity for lean-NaFlD, we conducted further investigations in response to the relatively low sensitivity (60%) observed in our previous study when utilizing a Fli cut-off point of 15 [14].We postulated that the presence of unidentified factors, short follow-up period, and relatively small sample size may contribute to this limitation.to address this concern, we employed the lassO model for predicting lean-NaFlD based on blood tests and anthropometric measurements.in the current study, we further compared Fli and the lassO model as predictors of lean-NaFlD by setting the Fli cut-off point at 15.45.Both models demonstrated moderate accuracy in predicting lean-NaFlD.Given the importance of WhR and visceral fat in predicting fatty liver, we further compared the ability of Fli and lassO model to predict NaFlD while holding Wc constant.Both models demonstrated moderate accuracy in predicting lean-NaFlD, but the lassO model was slightly more accurate than the Fli model in predicting fatty liver.Our findings suggest that the lassO model may serve as an alternative tool for predicting lean-NaFlD, thus providing clinicians with more options for clinical decision-making.
We found that high visceral fat, high triglyceride levels, low hDl-c levels, and high WhR were associated with lean-NaFlD.although both BMi and visceral adiposity demonstrate high accuracy and performance in identifying patients with lean-NaFlD, unlike BMi, which is a crude measure of body fat based on height and weight, visceral fat reflects the distribution of body fat and its impact on health.Measuring visceral fat using the lassO model offers novel insights into body composition and its relationship with health outcomes.Research has shown that individuals with high levels of visceral fat are at an increased risk of chronic health conditions, even if their BMi falls within the 'normal' range [27].
Moreover, previous studies have reported a positive association between visceral fat-notably intra-abdominal fat-and NaFlD [28,29]; regarding NaFlD, visceral fat is more impactful than BMi [30].One cohort study with a follow-up period of 4.4 years demonstrated that greater amounts of visceral fat were associated with a higher risk of developing NaFlD [31].herein, visceral fat played an important role in lean-NaFlD prediction, as implied by another predictor, WhR.WhR has been reported to be a superior indicator of abdominal obesity and visceral adiposity compared to Wc. increased visceral fat leads to increased insulin resistance with unrestrained lipolysis [32], and more free fatty acids and adipokines/cytokines are released into the portal circulation, which are key factors in NaFlD [33].individuals with normal BMi may have high visceral fat, and measuring BMi only underestimates the risk of NaFlD and overlooks this issue.therefore, this study found that higher visceral fat was more influential than BMi in the lean-NaFlD population.
We also found that patients with lean-NaFlD and those with obesity share common metabolic abnormalities [3].low hDl-c levels [34,35] and high triglyceride [36] levels are closely associated with the development of NaFlD.low hDl-c impairs the removal of cholesterol from liver cells, leading to its accumulation and contributing to fatty liver formation [37].elevated triglyceride levels result in increased triglyceride synthesis and reduced clearance, causing triglycerides to accumulate in liver cells and further promote fatty liver development.the interplay between low hDl-c and high triglyceride levels exacerbates the progression of fatty liver disease by impairing lipid metabolism and promoting lipid accumulation within hepatocytes.
several studies have been conducted to construct prediction models for NaFlD.cen et al. [38] employed six parameters, namely body fat mass, diastolic blood pressure, serum uric acid, fasting blood glucose, triglyceride levels, and alanine lipase levels, to predict NaFlD.these parameters align with the indicators used in our study, despite differences in the study population (overweight adults vs. normal or underweight adults).several studies have also developed prediction models for lean-NaFlD.su et al. [39] utilized a two-class neural network with 10 features, including BMi, Wc, weight, age, blood pressure, serum triglyceride levels, serum hDl-c, glucose, and serum glutamic-pyruvic transaminase levels to predict NaFlD in individuals with a BMi <23 kg/m 2 .liu et al. [40] developed a nomogram based on seven laboratory profiles, including triglyceride levels.Wang et al. [41] incorporated predictors such as triglycerides and hDl in their nomogram construction.compared to these studies, our research incorporated a broader range of variables encompassing lifestyle factors, laboratory parameters, Wc, visceral fat, and WhR, providing enhanced insights for predicting lean-NaFlD.
this study had some limitations that should be acknowledged.First, this was a single-center retrospective study that focused on an asian population.therefore, the applicability of this prediction model to other populations may be limited.second, we did not assess the long-term outcomes of lean-NaFlD in these populations or the cost-effectiveness of the screening.the relationship between NaFlD and extrahepatic diseases should be investigated in the future.third, there may be unknown confounding factors that could limit the accuracy of the lassO model, such as genetic problems.Nevertheless, the development of this tool can aid in the early identification of at-risk NaFlD individuals.While the lassO model developed in this study showed comparable predictive power to Fli, the use of machine learning for NaFlD prediction is a growing trend that allows for the inclusion of a broader range of predictive factors. in future studies, we plan to incorporate more variables to increase the accuracy of NaFlD prediction.Moreover, external validation using data from a second population or large-scale biobanks, such as the UK Biobank, would significantly strengthen our findings, will be a primary focus in our future research to enhance the robustness and applicability of our model.

Conclusion
We developed and validated a lassO-derived prediction model based on four clinical parameters-visceral fat, triglyceride levels, hDl-c levels, and WhR. this model exhibited good performance in terms of predicting lean-NaFlD in an asian population.thus, we provide a personalized risk stratification screening strategy for NaFlD in these low-risk populations.

Figure 2 .
Figure 2.This heatmap visually represents the correlation matrix of 23 features, including age, sex, BMi, Wc, BW, body fat, WHR, history of hypertension (HTn) and diabetes mellitus (dM), fasting glucose, exercise frequency, alcohol drinking frequency, platelet count, alkaline phosphatase, HbA1c, cholesterol, Hdl-c, ldl-c, triglyceride levels, uric acid, muscle mass, visceral fat, and total cholesterol/Hdl-c ratio.The color spectrum in the heatmap spans from deep blue, denoting positive correlations, to deep red, indicating negative correlations.

Table 2 .
demographic data comparison based on non-alcoholic fatty liver status, stratified from training set.

Table 3 .
Univariable and lAsso-derived multivariable logistic regression for predicting fatty liver in subjects with lean BMi.

Table 4 .
Prediction performance of lAsso and fli models, n = 1163.

Table 5 .
Prediction performance and comparison of lAsso and fli models.